## TFSA-2021-102: Interpreter crash from `tf.io.decode_raw`

### CVE Number
CVE-2021-29614

### Impact
The implementation of `tf.io.decode_raw` produces incorrect results and crashes
the Python interpreter when combining `fixed_length` and wider datatypes.

```python
import tensorflow as tf

tf.io.decode_raw(tf.constant(["1","2","3","4"]), tf.uint16, fixed_length=4)
```

The [implementation of the padded
version](https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc)
is buggy due to a confusion about pointer arithmetic rules.

First, the code
[computes](https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc#L61)
the width of each output element by dividing the `fixed_length` value to the
size of the type argument:

```cc
int width = fixed_length / sizeof(T);
```

The `fixed_length` argument is also used to determine the [size needed for the
output
tensor](https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc#L63-L79):

```cc
TensorShape out_shape = input.shape();
out_shape.AddDim(width);
Tensor* output_tensor = nullptr;
OP_REQUIRES_OK(context, context->allocate_output("output", out_shape, &output_tensor));

auto out = output_tensor->flat_inner_dims<T>();
T* out_data = out.data();
memset(out_data, 0, fixed_length * flat_in.size());
```

This is followed by [reencoding
code](https://github.com/tensorflow/tensorflow/blob/1d8903e5b167ed0432077a3db6e462daf781d1fe/tensorflow/core/kernels/decode_padded_raw_op.cc#L85-L94):

```cc
for (int64 i = 0; i < flat_in.size(); ++i) {
  const T* in_data = reinterpret_cast<const T*>(flat_in(i).data());

  if (flat_in(i).size() > fixed_length) {
    memcpy(out_data, in_data, fixed_length);
  } else {
    memcpy(out_data, in_data, flat_in(i).size());
  }
  out_data += fixed_length;
}
```

The erroneous code is the last line above: it is moving the `out_data` pointer
by `fixed_length * sizeof(T)` bytes whereas it only copied at most
`fixed_length` bytes from the input. This results in parts of the input not
being decoded into the output.

Furthermore, because the pointer advance is far wider than desired, this quickly
leads to writing to outside the bounds of the backing data. This OOB write leads
to interpreter crash in the reproducer mentioned here, but more severe attacks
can be mounted too, given that this gadget allows writing to periodically placed
locations in memory.

### Patches
We have patched the issue in GitHub commit
[698e01511f62a3c185754db78ebce0eee1f0184d](https://github.com/tensorflow/tensorflow/commit/698e01511f62a3c185754db78ebce0eee1f0184d).

The fix will be included in TensorFlow 2.5.0. We will also cherrypick this
commit on TensorFlow 2.4.2, TensorFlow 2.3.3, TensorFlow 2.2.3 and TensorFlow
2.1.4, as these are also affected and still in supported range.

### For more information
Please consult [our security
guide](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) for
more information regarding the security model and how to contact us with issues
and questions.
